Goal
Learn about linked animations in R + plotly.
Main reference
https://plotly-r.com/client-side-linking.html#linking-animated-views
"This section focuses on a particular approach to linking views known as graphical (database) queries using the R package plotly."
"With plotly, one can write R code to pose graphical queries that operate entirely client-side in a web browser (i.e., no special web server or callback to R is required)."
"In addition to teaching you how to pose queries with the highlight_key() function, this section shows you how to control how queries are triggered and visually rendered via the highlight() function."
(Emphasis and formatting mine).
First example
Take a data frame called mtcars.
mtcars %>% head() %>% print()
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The option highlight_key(~cyl) lets you hover to a data point, and highlight the points with the same cyl values.
Use highlight() to customize the various interactions e.g. hover, single click, double click.
In the background, highlight(on="plotly_hover") performs a SQL query of the form: SELECT * FROM mtcars WHERE cyl IN $SELECTION_VALUE.
library(plotly)
mtcars %>%
highlight_key(~cyl) %>%
plot_ly(
x = ~wt, y = ~mpg, text = ~cyl, mode = "markers+text",
textposition = "top", hoverinfo = "x+y"
) %>%
highlight(on = "plotly_hover", off = "plotly_doubleclick")
More involved example
Now, load the txhousing dataset, whose rows are monthly median sales of real estate in regions of Texa:
data(txhousing, package = "ggplot2")
txhousing %>% head() %>% print()
## # A tibble: 6 x 9
## city year month sales volume median listings inventory date
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abilene 2000 1 72 5380000 71400 701 6.3 2000
## 2 Abilene 2000 2 98 6505000 58700 746 6.6 2000.
## 3 Abilene 2000 3 130 9285000 58100 784 6.8 2000.
## 4 Abilene 2000 4 98 9730000 68600 785 6.9 2000.
## 5 Abilene 2000 5 141 10590000 67300 794 6.8 2000.
## 6 Abilene 2000 6 156 13910000 66900 780 6.6 2000.
Now, we'll create an interactive map:
First, declare city as the SQL 'query by' column. THIS IS THE ONLY INTERACTIVE ELEMENT IN THIS EXAMPLE.
tx <- highlight_key(txhousing, ~city)
Then, initiate a plotly object called base, then pipe it through some plotly functions to add features.
# initiate a plotly object
base <- plot_ly(tx, color = I("black")) %>% group_by(city)
base %>%
group_by(city) %>%
add_lines(x = ~date, y = ~median) -> time_series
Linking
Now, (finally) we get to linking.
One obvious thing from playing around with the above application is that not every city has complete pricing information (e.g., South Padre Island, San Marcos, etc).
To learn more about what cities are missing information as well as how that missingness is structured, we aim to link - a view of the raw time series - to a dot-plot of the corresponding number of missing values per city.
In addition to making it easy to see how cities rank in terms of missing house prices, it also provides a way to query the corresponding time series (i.e., reveal the structure of those missing values) by brushing cities in the dot-plot.
This general pattern of linking aggregated views of the data to more detailed views fits the famous and practical information visualization advice from Shneiderman (1996): “Overview first, zoom and filter, then details on demand”.
data(txhousing, package = "ggplot2")
tx <- highlight_key(txhousing, ~city)
base <- plot_ly(tx, color = I("black")) %>%
group_by(city)
time_series <- base %>%
group_by(city) %>%
add_lines(x = ~date, y = ~median)
dot_plot <- base %>%
summarise(miss = sum(is.na(median))) %>%
filter(miss > 0) %>%
add_markers(
x = ~miss,
y = ~forcats::fct_reorder(city, miss),
hoverinfo = "x+y") %>%
layout(
xaxis = list(title = "Number of months missing"),
yaxis = list(title = "")) %>%
highlight(on = "plotly_click", dynamic = TRUE, selectize = TRUE)
dot_plot
time_series
subplot(dot_plot,
time_series,
widths = c(.2, .8), titleX = TRUE) %>%
layout(showlegend = FALSE) %>%
highlight(on = "plotly_selected", dynamic = TRUE, selectize = TRUE)
(I can't get these to link; this is maybe because the documentation does not specify what time_series is, so I'm just guessing.)
Some takeaways - only one highlight() is possible for each visualization. - subplot(a,b) links two plotly plots a and b. layout() is formatting for the subplots.
Q: I can't figure out what the background grey is doing.
How does plotly know to highlight the time series when markers in the dot-plot are selected? The answer lies in what data values are embedded in the graphical markers via highlight_key().
What happens in the background? First imagine a linked database query being performed behind the scenes:
When ‘South Padre Island’ is selected, it first filters the aggregated dot-plot data down to just that one row, then it filters down the raw time-series data down to every row with ‘South Padre Island’ as a city.
The drawing logic will then call Plotly.addTrace() with the newly filtered data which adds a new graphical layer representing the selection, allowing us to have finely-tuned control over the visual encoding of the data query.
The biggest advantage of drawing an entirely new graphical layer with the filtered data is that it becomes easy to leverage statistical trace types for producing summaries that are conditional on the query; here is another example of doing this with histograms:
hist <- add_histogram(
base,
x = ~median,
histnorm = "probability density"
)
subplot(time_series, hist, nrows = 2) %>%
layout(barmode = "overlay", showlegend = FALSE) %>%
highlight(
dynamic = TRUE,
selectize = TRUE,
selected = attrs_selected(opacity = 0.3)
)
Highlight vs Filter
A highlight event dims the opacity of existing marks, then adds an additional graphical layer representing the selection.
A filter event completely remove existing marks and rescales axes to the remaining data.
(Filtering uses the crosstalk R library.)
library(crosstalk)
# generally speaking, use a "unique" key for filter,
# especially when you have multiple filters!
tx <- highlight_key(txhousing)
gg <- ggplot(tx) + geom_line(aes(date, median, group = city))
## 1. FILTER using filter_select() + bscols()
filter <- bscols(
filter_select("id", "Select a city", tx, ~city), ## A DIRECT
ggplotly(gg, dynamicTicks = TRUE),
widths = c(12, 12)
)
filter
## 2. HIGHLIGHT using highlight()
tx2 <- highlight_key(txhousing, ~city, "Select a city")
gg <- ggplot(tx2) + geom_line(aes(date, median, group = city))
select <- highlight(
ggplotly(gg, tooltip = "city"),
selectize = TRUE, persistent = TRUE
)
select
Some takeaways:
bscols() is for making side-by-side HTML elements. This will be important
- The
highlight functionality is from the plotly side; the filter_select functionality heavily borrows from crosstalk.
Multiple widgets
Show only a subset of years or cities, and also only show medians with sales in some range:
library(crosstalk)
tx <- highlight_key(txhousing)
widgets <- bscols(
widths = c(12, 12, 12),
filter_select("city", "Cities", tx, ~city),
filter_slider("sales", "Sales", tx, ~sales),
filter_checkbox("year", "Years", tx, ~year, inline = TRUE)
)
bscols(
widths = c(4, 8), widgets,
plot_ly(tx, x = ~date, y = ~median, showlegend = FALSE) %>%
add_lines(color = ~city, colors = "black")
)
As Figure 16.9 demonstrates, filter and highlight events can work in conjunction with various htmlwidgets. In fact, since the semantics of filter are more well-defined than highlight, linking filter events across htmlwidgets via crosstalk should generally be more well-supported.
eqs <- highlight_key(quakes)
stations <- filter_slider(
"station", "Number of Stations",
eqs, ~stations
)
p <- plot_ly(eqs, x = ~depth, y = ~mag) %>%
add_markers(alpha = 0.5) %>%
highlight("plotly_selected")
library(leaflet)
map <- leaflet(eqs) %>%
addTiles() %>%
addCircles()
bscols(
widths = c(6, 6, 3),
p, map, stations
)
Some thoughts
I'm guessing we will have to make a choice (this is really a luxury) between the widget functionality that Shiny provides, and what we can do directly in conjunction with plotly. The last paragraph of chapter 16.5 says this:
"16.5 Limitations
The graphical querying framework presented here is for posing database queries between multiple graphs via direct manipulation. For serious statistical analysis, one often needs to link other data views (i.e., text-based summaries, tables, etc) in other arbitrary ways. For these use cases, the R package shiny makes it very easy to build on concepts we’ve already covered to build more powerful client-server applications entirely in R, without having to learn any HTML, CSS, or JavaScript. The next Chapter 17 gives a brief introduction to shiny, then dives right into concepts related to linking plotly graphics to other arbitrary views."
So, it seems to me that both chapters 16 and 17 are requisite readings.
Finally, Animations
"The graphical querying framework (Section 16.1) works in tandem with key-frame animations Section (14)."
library(gapminder)
gapminder %>% print()
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # … with 1,694 more rows
The dataset contains rows equal to unique country-years.
The features are continent, lifeExp, pop, and gdpPercap.
The following animation does a few things: - Draws a scatter plot over time frame=year, of gdpPercap and lifeExp. - Colors by continent color=continent. - Highlights the animation by continent g <- highlight_key(gapminder, ~continent). + Hovering is enabled by highlight(..., "plotly_hover"). + If you put an - Adds a best-fit line geom_smooth(method='lm') between gdpPercap and lifeExp.
g <- highlight_key(gapminder, ~continent)
gg <- ggplot(g, aes(gdpPercap, lifeExp, color = continent, frame = year)) +
geom_point(aes(size = pop, ids = country)) +
geom_smooth(se = FALSE, method = "lm") +
scale_x_log10()
highlight(ggplotly(gg), "plotly_hover")
Linked Animations
(We first need to install an R package from github using Rscript -e "devtools::install_github('cpsievert/plotly_book')".)
Now, let's try to link two animations.
## data(gap, package = "plotlyBook")
load("./gap.rda") ## from https://github.com/cpsievert/plotly_book/blob/master/data/gap.rda
gap %>% print()
## # A tibble: 1,704 x 8
## country continent year lifeExp pop gdpPercap area popDen
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 647500 13.0
## 2 Afghanistan Asia 1957 30.3 9240934 821. 647500 14.3
## 3 Afghanistan Asia 1962 32.0 10267083 853. 647500 15.9
## 4 Afghanistan Asia 1967 34.0 11537966 836. 647500 17.8
## 5 Afghanistan Asia 1972 36.1 13079460 740. 647500 20.2
## 6 Afghanistan Asia 1977 38.4 14880372 786. 647500 23.0
## 7 Afghanistan Asia 1982 39.9 12881816 978. 647500 19.9
## 8 Afghanistan Asia 1987 40.8 13867957 852. 647500 21.4
## 9 Afghanistan Asia 1992 41.7 16317921 649. 647500 25.2
## 10 Afghanistan Asia 1997 41.8 22227415 635. 647500 34.3
## # … with 1,694 more rows
The dataset is a table whose rows are country-years, and contains features such as life expectancy, gdp per capita, area and population density.
The following animation does a few things:
Highlighting key is by country gapKey <- highlight_key(gap, ~country).
- Basically, these are two different representations of one data frame; each row is one data point in both plots, but the x and y axes are different.
- p1 Draws a scatterplot of country vs population density
plot_ly(gap, y = ~country, x = ~popDen, hoverinfo = "x")
p2 Draws a scatterplot between per capita gdp vs life expectancy, with points sized by popDen.
Animation is over years, since frame=~year in both p1 and p2
gapKey <- highlight_key(gap, ~country)
p1 <- plot_ly(gap, y = ~country, x = ~popDen, hoverinfo = "x") %>%
add_markers(alpha = 0.1, color = I("black")) %>%
add_markers(
data = gapKey,
frame = ~year,
ids = ~country,
color = I("red")
) %>%
layout(xaxis = list(type = "log"))
p2 <- plot_ly(gap, x = ~gdpPercap, y = ~lifeExp, size = ~popDen,
text = ~country, hoverinfo = "text") %>%
add_markers(color = I("black"), alpha = 0.1) %>%
add_markers(
data = gapKey,
frame = ~year,
ids = ~country,
color = I("red")
) %>%
layout(xaxis = list(type = "log"))
subplot(p1, p2, nrows = 1, widths = c(0.3, 0.7), titleX = TRUE) %>%
hide_legend() %>%
animation_opts(1000, redraw = FALSE) %>%
layout(hovermode = "y", margin = list(l = 100)) %>%
highlight(
"plotly_selected",
color = "blue",
opacityDim = 1,
hoverinfo = "none"
)
What's next?
Next steps:
- Try out the examples in 16.4 (in a similar fashion/rigor as above).
- Go over chapter 17.
- Read frame-wise animations Section (14).
Some questions to answer (SH: Henry please follow up with me on these points.)
- Can we link animations coming from more than one data frame? or do we need to somehow make a giant data frame.
- How to use these tools for visualizing our data?
- How to use widgets with
subplots()?
- How to host in a Shiny app?
- What can we use from native shiny, and what can we use from plotly (i.e. building widgets entirely using plotly).
An example using our data
Load our data.
filedir = "./res.RDS"
res = readRDS(filedir)
Make some scatterplots:
## Summarize the ylist by collapsing the counts into two dimensions
devtools::load_all('~/repos/flowcy/flowcy')
## Use the functionality in the "flowcy" R package.
par(mfrow = c(1, 3))
tt = 50
library(dplyr)
for(dims in list(c(1,2), c(2,3), c(3,1))){
one_dim_scatterplot(res$ylist, obj=NULL, tt=tt,
countslist = res$countslist, cex_fac=20,
dims = dims)
}

Now, make an animation:
times = 200:296
fac = 2
## First pair of dimensions
dims = c(1:2)
ylist_summary <- Map(collapse_3d_to_2d, res$ylist, res$countslist,
dims = rep(list(dims), length(res$ylist)))[times]
ylist_summary <- lapply(ylist_summary, as_tibble)
ylist_summary <- Map(function(dat, nm){cbind(dat, time=nm)},
ylist_summary, names(ylist_summary))
ymat = do.call(rbind, ylist_summary)
ymat$counts = ymat$counts/max(ymat$counts) %>% sqrt() * fac
p1 = plot_ly(ymat, x= ~diam_mid,y = ~chl_small,
marker = list(size = ~counts, opacity=0.3),
frame = ~time,
col="blue")
## p1 %>% animation_opts(50, redraw = FALSE)
## Second pair of dimensions
dims = c(2:3)
ylist_summary <- Map(collapse_3d_to_2d, res$ylist, res$countslist,
dims = rep(list(dims), length(res$ylist)))[times]
ylist_summary <- lapply(ylist_summary, as_tibble)
ylist_summary <- Map(function(dat, nm){cbind(dat, time=nm)},
ylist_summary, names(ylist_summary))
ymat = do.call(rbind, ylist_summary)
ymat$counts = ymat$counts/max(ymat$counts) %>% sqrt() * fac
p2 = plot_ly(ymat, x= ~chl_small,y = ~pe,
marker = list(size = ~counts, opacity=0.3),
frame = ~time,
col="blue")
## Third pair of dimensions
dims = c(3,1)
ylist_summary <- Map(collapse_3d_to_2d, res$ylist, res$countslist,
dims = rep(list(dims), length(res$ylist)))[times]
ylist_summary <- lapply(ylist_summary, as_tibble)
ylist_summary <- Map(function(dat, nm){cbind(dat, time=nm)},
ylist_summary, names(ylist_summary))
ymat = do.call(rbind, ylist_summary)
ymat$counts = ymat$counts/max(ymat$counts) %>% sqrt() * fac
p3 = plot_ly(ymat, x= ~pe,y = ~diam_mid,
marker = list(size = ~counts, opacity=0.3),
frame = ~time,
col="blue")
subplot(p1, p2, p3) %>% animation_opts(50, redraw = FALSE) %>% layout(width=1500, height=500)